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Abstract 

Background: Chorella is the representative taxon of Chlorellales in Trebouxiophyceae, and its chloroplast (cp) 
genomic information has been thought to depend only on studies concerning Chlorella vulgaris and GenBank 
information of C. variablis. Mitochondrial (mt) genomic information regarding Chlorella is currently unavailable. To 
elucidate the evolution of organelle genomes and genetic information of Chlorella, we have sequenced and 
characterized the cp and mt genomes of Arctic Chlorella sp. ArM0029B. 

Results: The 1 19,989-bp cp genome lacking inverted repeats and 65,049-bp mt genome were sequenced. The 
ArM0029B cp genome contains 1 14 conserved genes, including 32 tRNA genes, 3 rRNA genes, and 79 genes 
encoding proteins. Chlorella cp genomes are highly rearranged except for a Chlorella-speafic six-gene cluster, and 
the ArM0029B plastid resembles that of Chlorella variabilis except for a 15-kb gene cluster inversion. In the mt genome, 
62 conserved genes, including 27 tRNA genes, 3 rRNA genes, and 32 genes encoding proteins were determined. The 
mt genome of ArM0029B is similar to that of the non-photosynthetic species Prototheca and Heicosporidium. The 
ArM0029B mt genome contains a group I intron, with an ORF containing two LAGLIDADG motifs, in coxl. The intronic 
ORF is shared by C. vulgaris and Prototheca. The phylogeny of the plastid genome reveals that ArM0029B showed a 
close relationship of Chlorella to Parachlorella and Oocystis within Chlorellales. The distribution of the coxl intron 
at 721 support membership in the order Chlorellales. Mitochondrial phylogenomic analyses, however, indicated 
that ArM0029B shows a greater affinity to MX-AZ01 and Coccomyxa than to the Helicosporidium-Prototheca clade, 
although the detailed phylogenetic relationships among the three taxa remain to be resolved. 

Conclusions: The plastid genome of ArM0029B is similar to that of C. variabilis. The mt sequence of ArM0029B is 
the first genome to be reported for Chlorella. Chloroplast genome phylogeny supports monophyly of the seven 
investigated members of Chlorellales. The presence of the cox/ intron at 721 in all four investigated Chlorellales 
taxa indicates that the cox7 intron had been introduced in early Chorellales as a c/'s-splice form and that the 
c/'s-splicing intron was inherited to recent Chlorellales and was recently frans-spliced in Helicosporidium. 



Background 

Chloroplasts and mitochondria, organelles of higher plants 
and algae, play important roles in energy production, 
photosynthesis, and metabolite production required for 
maintaining life. Although numerous biological functions 
of both organelles rely considerably on proteins imported 
from nuclear encoded genes, understanding the organelle 
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genome will provide a major impact in the fields of evolu- 
tion, biology, and biotechnology. 

Currently, many genome projects are in progress for 
green microalgae. To date, more than 20 organelle genomes 
have been completely sequenced in green microalgae 
[1]. Generally, chloroplasts and mitochondria in green 
algae have multiple copies of a single type of circular gen- 
ome. In green algae, various plastid genome sizes have 
been reported: 37.7 kb in the non-photosynthetic alga 
Helicosporidium sp. and 203.8 kb in Chlamydomonas 
reinhardtii [2,3]. Plastid genomes in higher plants and 
green algae encode 88-138 genes [4,5]. Typical plastid ge- 
nomes contain a large inverted repeat (IR) region with 
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genes for rRNA, several tRNAs, and proteins. However, 
plastid genomes lacking an IR region also have been re- 
ported in some species [6,7]. The size of the mitochondrial 
(mt) genome varies among species: 6 kb in Plasmodium 
to 3,000 kb in the cucumber family [8,9]. The number of 
mt genes also varies: 5 genes in Plasmodium and about 
100 genes in Jakobid flagellates [10]. 

Chlorella species, one of the best-known unicellular 
green algae, was studied in early research on photosyn- 
thesis [11] and is now used as a model and source for 
biotechnology and commercial applications such as use 
as a food additive, feed, and bioenergy source. The 
Chlorella genus belongs to Trebouxiophyceae, one of 
the Chlorophyte groups [12]. Trebouxiophyceae, found 
mostly in soil and freshwater, is a large algal group in- 
cluding Chlorella, Oocystis, Parachlorella, Coccomyxa, 
and Helicosporidium. The availability of organellar gen- 
omic information in Trebouxiophyceae, however, is very 
limited. Plastid genomes of seven species {Chlorella vul- 
garis C-27, Chlorella variabilis NC64A, Coccomyxa sp. 
C-169., Trebouxiophyceae sp. MX-AZ01, Helicospori- 
dium sp., Oocystis solitaria, and Parachlorella kessleri) in 
Trebouxiophyceae have been sequenced, and they display a 
wide range of genome sizes, gene content, and intron con- 
tent [13,14]. An IR region is missing in the plastid genome 
of Chlorella vulgaris C-27 [15] and Chorella variabilis 
NC64A (Accession no. NC_015359) but is detected in 
most of the Trebouxiophyceae {Coccomyxa sp., Parachlor- 
ella kessleri, and Oocystis solitaria) group. To date, the 
complete mt genome sequences have been reported in 
four trebouxiophycean algae, and they show a limited 



range of genome sizes, gene repertoires, and intron 
content. Two of them are non-photosynthetic relatives 
of Chlorella — Prototheca wickerhamii [16] and Helicos- 
poridium sp. [17]. Two others are Coccomyxa sp. C-169 
of Coccomyxaceae [18] and the unclassified Trebouxio- 
phycean alga Trebouxiophyceae sp. MX-AZ01 [14]. 
However, the mt genome of Chlorella species remains 
unknown. 

In the present study, we report the chloroplast (cp) 
and mt sequences of Chlorella sp. ArM0029B, which 
was isolated from drift ice in the Arctic region and has 
features of high lipid accumulation and fast growth at vari- 
ous temperatures [19]. The plastid genome of ArM0029B 
is similar to that of C. variabilis NC64A except for large in- 
versions and fewer introns. The mt sequence of ArM0029B 
here is the first genome to be reported for Chlorella. We 
compared the Chlorella sp. ArM0029B organelle genome 
within Trebouxiophyceae and discussed cp phylogeny and 
coxl intron evolution. The unique features of both organ- 
elle genomes in Chlorella sp. ArM0029B presented here 
will provide an important insight into the evolution of 
organelle genomes within microalgal species and genetic 
information for biotechnology. 

Results and discussion 

Genomic organization and features of Arctic Chlorella sp. 
ArM0029B 

The cp and mt genome sequences of ArM0029B were 
assembled as circular molecules of 119,989 bp and 
65,049 bp, respectively (Figure 1). However, linear 
plastomes, concatenated pieces representing multiple 




Figure 1 Plastid and mt genomic maps of Chlorella sp. ArM0029B. 
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plastomes (sometimes circular), and even branched forms 
were reported in many species [20,21]. The polymerase 
chain reaction (PCR) approach we used would not rule 
out linear, concatenated or branched structures of an or- 
ganelle genome. Therefore, we cannot exclude other com- 
plex conformations of the organelle genome in ArM0029B. 
The cp genome of ArM0029B contains 114 genes exclud- 
ing the non-conserved open reading frames (ORFs) encod- 
ing over 50 amino acids (Tables 1 and 2). BLASTP search 
against the NCBI NR database revealed that all of the 79 
protein-coding genes were conserved (E value < 1E-6), 
while only five of them were conserved hypothetical pro- 
teins. We identified 71 additional ORFs using the Glimmer 
(see Additional file 1: Table SI), but they were not in- 
corporated into the final gene set because only two of 



them showed homology to bacterial hypothtical proteins. 
ArM0029B does not carry large IRs in the plastid genome 
as well as C. variabilis NC64A, C. vulgaris C-27, Cocco- 
myxa sp. C-169, and Trebouxiophyceae sp. MX-AZ01, in- 
dicating that all genes are present as a single copy. The 
general features and gene lists were compared (Tables 1 
and 2). The overall GC content of the genome of Chlorella 
sp. ArM0029B is low (33.92%) similar to that of C. varia- 
bilis NC64A (33.93%) and C. vulgaris C-27 (31.6%) but in 
contrast to that of Coccomyxa sp. C-169 (50.71%) and Tre- 
bouxiophyceae sp. MX-AZ01 (56.25%). The length of all 
1 14 conserved genes in the plastid genome of ArM0029B 
is 64,626 bp, and the genes account for a coding density of 
53.8% of the total cp genome sequence. The latter value is 
the highest coding density among all reported Chlorella 



Table 1 General features of plastid and mt genomes in trebouxiophycean algal species 



Chloroplast 


Chlorella sp. ArM0029B 


C. variabilis NC64 


C. vulgaris 


Coccomyxa 


Trebouxiophyceae sp. 


Length 


1 1 9989 


1 24579 


150613 


175731 


149707 


AT contents (%) 


66.08 


66.07 


68.44 


49.29 


43.75 


# genes 


114 


114 


115* 


115 


115 


# conserved CDS 


79 


79 


79 


79 


79 


# tRNA 


32 


32 


33 


33 


33 


# rRNA 


3 


3 


3 


3 


3 


# introns 


1 


3 


3 


1 


5 


Coding density 


53.8 


50.5 


38.7 


39.9/40.8 


47.7/48.5 


CDS + strand 


5/ 


5/ 


26 


44 


72 


rRNA + srand 


3 


3 


3 


3 


3 


tRNA + strand 


14 


18 


17 


1/ 


15 


CDS - strand 


22 


22 


51 


35 


8 


rRNA - srand 


0 


0 


0 


0 


0 


tRNA - strand 


18 


14 


16 


15 


1/ 


Percent of +/- 


64.9 


68.4 


40 


55.6 


77.39 


Mitochondria 


Chlorella sp. ArM0029B 


Prototheca 


Helicosporidium sp. 


Coccomyxa 


Trebouxiophyceae sp. 


Length 


65049 


55328 


49343 


65497 


74423 


AT contents (%) 


71.5 


74.2 


74.4 


46.8 


46.6 


# genes 


62 


61 


60 


59 


56 


# conserved CDS 


32 


30 


32 


30 


30 


# tRNA 


27 


26 


25 


26 


23 


# rRNA 


3 


3 


3 


3 


3 


# introns 


I (D 


1 (5) 


1 (4) 


I (1) II (4) 


1 (7) II (4) 


Coding density 


50.2 


59.8 


64.99 


48.38 


42.64 


CDS on + strand 


18 


13 


9 


29 


29 


CDS on - strand 


14 


18 


23 


1 


1 


tRNAARNA on + srand 


17 


12 


10 


28 


26 


tRNA/rRNA on - srand 


13 


17 


18 


1 


0 


Percent of +/- 


56.5 


59.3 


68.3 


96.7 


98.2 



*The asterisk indicates information without unnamed hypothetical ORFs in the cp sequence of Chlorella vulgaris C-27. 
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Table 2 Gene list of the plastid genome in 
trebouxiophycean algal species 

ArM0029B NC64 C. vulgaris Coccomyxa Trebouxio- 

phyceae sp. 

MX-AZ01_ 
NCJ) 18569 



accD 


0 


0 


0 


0 


0 


atpA 


0 


0 


0 


0 


0 


atpB 


0 


0 


0 


0 


0 


atpE 


0 


0 


0 


0 


0 


atpF 


0 


0 


0 


0 


0 


atpH 


0 


0 


0 


0 


0 


atpl 


0 


0 


0 


0 


0 


ccsA 


0 


0 


0 


0 


0 


cemA 


0 


0 


0 


0 


0 


chIB 


0 


0 


0 


0 


0 


chll 


0 


0 


0 


0 


0 


chIL 


0 


0 


0* 


0 


0 


chIN 


0 


0 


0 


0 


0 


dpP 


0 


0 


0 


0 


0 


cysA 


0 


0 


0 


0 


0 


cysT 


0 


0 


0 


0 


0 


ftsH 


0 


0 


0 


0 


0* 


InfA 


0 


0 


0 


0 


0 


minD 


0 


0 


0 


0 


0 


petA 


0 


0 


0 


0 


0 


petB 


0 


0 


0 


0 


0 


petD 


0 


0 


0 


0 


0 


petG 


0 


0 


0 


0 


0 


petL 


0 


0 


0 


0 


0 


psaA 


0 


0 


0 


0 


0 


psaB 


0 


0 


0 


0 


0 


psaC 


0 


0 


0 


0 


0 


psa 


0 


0 


0 


0 


0 


psaJ 


0 


0 


0 


0 


0 


psaM 


0 


0 


0 


0 


0 


psbA 


0 


0* 


0 


0 


0* 


psbB 


0 


0 


0 


0* 


0 


psbC 


0 


0* 


0 


0 


0 


psbD 


0 


0 


0 


0 


0 


psbE 


0 


0 


0 


0 


0 


psbF 


0 


0 


0 


0 


0 


psbH 


0 


0 


0 


0 


0 


psbl 


0 


0 


0 


0 


0 


psbJ 


0 


0 


0 


0 


0 


psbK 


0 


0 


0 


0 


0 


psbL 


0 


0 


0 


0 


0 
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Table 2 Gene list of the plastid genome in 
trebouxiophycean algal species (Continued) 



psbM 


0 


0 


0 


0 


0 


psbN 


0 


0 


0 


0 


0 


psbT 


0 


0 


0 


0 


0 


psbZ 


0 


0 


0 


0 


0 


rbcL 


0 


0 


0 


0 


0 


rpl12 


0 


0 


0 


0 


0 


rpl14 


0 


0 


0 


0 


0 


rpl16 


0 


0 


0 


0 


0 


rpl19 


0 


0 


0 


0 


0 


rpl2 


0 


0 


0 


0 


0 


rpl20 


0 


0 


0 


0 


0 


rpl23 


0 


0 


0 


0 


0 


rpl32 


0 


0 


0 


0 


0 


rpl36 


0 


0 


0 


0 


0 


rpl5 


0 


0 


0 


0 


0 


rpoA 


0 


0 


0 


0 


0 


rpoB 


0 


0 


0 


0 


0 


rpoG 


0 


0 


0 


0 


0 


rpoC2 


0 


0 


0 


0 


0 


rps1 1 


0 


0 


0 


0 


0 


rps12 


0 


0 


0 


0 


0 


rps14 


0 


0 


0 


0 


0 


rpsl 8 


0 


0 


0 


0 


0 


rps19 


0 


0 


0 


0 


0 


rps2 


0 


0 


0 


0 


0 


rps3 


0 


0 


0 


0 


0 


rps4 


0 


0 


0 


0 


0 


rps7 


0 


0 


0 


0 


0 


rps8 


0 


0 


0 


0 


0 


rps9 


0 


0 


0 


0 


0 


tilS 


0 


0 


0 


0+ 


0+ 


tufA 


0 


0 


0 


0 


0 


ycfl 


o 


o 


o 


o 


o 


ycfl2 


0 


0 


0 


0 


0 


ycf20 


0 


0 


0 


0 


0 


ycf3 


0 


0 


0 


0 


0 


ycf4 


0 


0 


0 


0 


0 


ycf47 


0 


0 


0 


0 


0 


minE 






0 






trnA 
(UGQ 


0 


0 


0 


0 


0 


TrnC 

(GCA) 


0 


0 


0 


0 


0 


trnD 


0 


0 


0 


0 


0 



(GUC) 
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00 00 



o o 
o o 
o o 



00 00 



O 00 



Table 2 Gene list of the plastid genome in 
trebouxiophycean algal species (Continued) 

trnE o oo o 

(UUC) 

trnF o oo o 

(GAA) 

trnG o oo o 

(UCQ 

trnG oo 
(GCC) 

trnH o 

(GUG) 

trnl (GAU) o 

trnl (CAU) o 

trnK o 
(UUU) 

trnL o 

(CAA) 

trnL o 

(GAG) 

trnL o* 
(UAA) 

trnL o 
(UAG) 

trnM oo 
(CAU) 

trnN o 

(GUU) 

trnP o 
(UGG) 

trnQ o 
(UUG) 

trnR o 

(ACG) 

trnR o 
(UCU) 

trnR o 

(CCG) 

trnR 
(CCU) 

trnS o 

(GQJ) 

trnS o 
(GGA) 

trnS o 
(UGA) 

trnT o 

(UGU) 

trnT o 

(GGU) 

trnV o 
(UAC) 

trnW o 
(CCA) 



Table 2 Gene list of the plastid genome in 
trebouxiophycean algal species (Continued) 

trnY o oo o 

(GUA) 



rrnL 
rrnS 
rrn5 



o o 

0 0 

o o 



*gene containing one intron; **gene containing two introns; ***gene 
containing three introns; oo, 2 copies; +, fragmented gene. 

spp. to date. These results indicate that the cp genome of 
ArM0029B is more compact than those of the above com- 
parable species. The mt genome of ArM0029B contains a 
total of 62 genes excluding the non-conserved ORFs 
among Trebouxioaceae (Table 1). Most of the ORFs en- 
coding over 50 amino acids are not conserved based on 
NCBI B1AST search. The general features and gene list 
of the genome of ArM0029B were compared with four 
Trebouxiophyceae spp., including Prototheca wickerha- 
mii, Helicosporidium sp., Coccomyxa sp. C-169, and Tre- 
bouxiophyceae sp. MX-AZ01 (Tables 1, 2, and 3). The 
gene number of the mt genome of ArM0029B is highest 
(62 genes) among the mt genomes of all sequenced spe- 
cies of Trebouxiophyceae. The overall GC content of the 
genome is low (28.5%) similar to that of Prototheca wick- 
erhamii (25.8%) and Helicosporidium sp. (25.6%) but in 
contrast to that of two species with a high GC content, 
Coccomyxa sp. C-169 (53.8%) and Trebouxiophyceae sp. 
MX-AZ01 (53.4%). All 62 conserved genes on the mt DNA 
of ArM0029B cover 32,655 bp in length and account for a 
coding density of 50.2% of the total mt genome sequence, 
representing an intermediate range compared with all se- 
quenced species of Trebouxiophyceae. 

In the cp genome, with 74 (64.9%) conserved genes 
occupying one strand and 40 genes occupying the other 
strand, the gene distribution over the two DNA strands 
of ArM0029B cp genome is biased (Figure 1, Table 1). 
The gene contents in one strand were detected to be 
68.4%, 40.5%, and 55.6% in the cp genome of NC64A, C. 
vulgaris C-27 and Coccomyxa C-169, respectively. These 
results indicate that gene distribution between the two 
strands of the cp genome is biased to some degree but 
relatively even in contrast to one of the mt genomes. In 
the mt genome of ArM0029B, 35 conserved genes oc- 
cupy one strand, and 27 genes occupy the other strand, 
indicating that the genes are evenly (56.5:43.5) distrib- 
uted in both strands of the ArM0029 mt genome 
(Table 1). The other two species, Prototheca wickerhamii 
and Heicosporidium sp., showed more biased occupation 
of the genes in one strand (59.3% and 68.3%, respectively) 
than Chlorella sp. ArM0029B. Furthermore, Coccomyxa 
and MX-AZ01 displayed a drastic biased distribution 
of the mt genes in one strand (96.7% and 98.2%, 
respectively). 
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Table 3 Distribution of the mt protein-coding gene and trn gene among trebouxiophycean algae and 
chlorophycean algae 








Trebouxiophyceae 






Chlorophyceae 




ArM0029B 


Prototheca 


Helicosporidium 


Coccomyxa 


MX-AZ01 


Scenedesmus Dunaliella Gonium Chlamydomonas 


atpl 


0 


0 


0 


0 


0 


- 


atp4 


0 


- 


0 


0 


0 


- 


atp6 


0 


0 


0 


0 


0 


0 


atp8 


0 


0 


0 


0 


0 


- 


atp9 


0 


0 


0 


0 


0 


0 


cob 


0 


0 


0 


0 


0 


0 0 0 0 


coxl 


0 


0 


0 


0 


0 


0 0 0 0 


cox2 


0 


0 


0 


0 


0 


o - - 


cox3 


0 


0 


0 


0 


0 


0 


nad1 


0 


0 


0 


0 


0 


0 0 0 0 


nad2 


0 


0 


0 


0 


0 


0 0 0 0 


nad3 


0 


0 


0 


0 


0 


0 


nad4 


0 


0 


0 


0 


0 


0 0 0 0 


nad4L 


0 


0 


0 


0 


0 


o - - 


nad5 


0 


0 


0 


0 


0 


0 0 0 0 


nad6 


0 


0 


0 


0 


0 


0 0 0 0 


nad7 


0 


0 


0 


0 


0 


- 


nad9 


0 


0 


0 


0 


0 


- 


rpl5 


0 


- 


0 


0 


0 


- 


rpl6 


0 


0 


0 


- 


- 


- 


rpI16 


0 


0 


0 


0 


0 


- 


rps2 


0 


0 


0 


0 


0 


- 


rps3 


0 


0 


0 


0 


0 


- 


rps4 


0 


0 


0 


0 


0 


- 


rps7 


0 


0 


0 


0 


0 


- 


rpslO 


0 


0 


0 


0 


0 


- 


rpsl 1 


0 


0 


0 


- 


- 


- 


rpsl 2 


0 


0 


0 


0 


0 


- 


rpsl 3 


0 


0 


0 


0 


0 


- 


rps14 


0 


0 


0 


0 


0 


- 


rpsl 9 


0 


0 


0 


0 


0 


- 


tatC 


0 


0 


0 


0 


0 


_ 


size 


65,049 


55,328 


49,343 


65,497 


74,423 


42,781 28,331 15,993 15,758 


tRNAs 






Trebouxiophyceae 






Chlorophyceae 




ArM0029B 


Prototheca 


Helicosporidium 


Coccomyxa 


MX-AZ01 


Scenedesmus Dunaliella Gonium Chlamydomonas 


trnA (UGC) 


0 


0 


0 


0 


0 


o _ _ _ 


trnC (GCA) 


0 


0 


0 


0 


0 


o _ _ _ 


trnD (GUQ 


0 


0 


0 


0 


0 


0 _ _ _ 


trnE (UUC) 


0 


0 


0 


0 


0 


0 _ _ _ 


trnF (GAA) 


0 


0 


0 


0 


0 


0 _ _ _ 


trnG (GCC) 


0 


0 




0 


0 




trnG (UGC) 


0 


0 


0 


0 




0 
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Table 3 Distribution of the mt protein-coding gene and trn gene among trebouxiophycean algae and 
chlorophycean algae (Continued) 



trnH (GUG) 


0 


0 


0 


0* 


0* 


0 








trnl (CAU) 


0 


0 


0 


0 


0 










trnl (GAU) 


0 


0 


0 


0 


0 


0 








trnl (UAU) 












0 








trnK (UUU) 


0 


0 


0 


0 


0 


0 








trnL (AAG) 












0 








trnL (CAA) 


0 






0 


0 










trnL (CAG) 












0 








trnL (UAA) 


0 


0 


0 


0 


0 










trnL (UAG) 


0 


0 


0 


0 


0 










trnM (CAU) 


00 


00 


00 


00 


0 


00 


0 


00 


0 


trnN (GUU) 


0 


0 


0 


0 


0 


0 








trnP (UGG) 


0 


0 


0 


0 




0 








trnQ (UUG) 


0 


0 


0 


0 


0 


0 


0 


0 


0 


trnR (ACG) 


0 


0 


0 


0 




0 








trnR (CCU) 












0 








trnR (UCU) 


0 


0 


0 


0 


0 


0 








trnS (GCU) 


0 


0 


0 


0* 


0* 


0 








trnS (GGA) 












0 








trnS (UGA) 


0 


0 


0 


0* 


0* 










trnT (UGU) 


0 


0 


0 














trnV (UAC) 


0 


0 


0 


0 


0 


0 








trnW (CCA) 


0 


0 


0 


0* 


0* 


0 


0 


0 


0 


trnW (CUA) 












0 








trnY (GUA) 


0 


0 


0 


0 


0 


0 








Total number 


27 


26 


25 


26 


23 


27 


3 


4 


3 



*gene containing intron; oo, 2 copies. 



Gene content and rearrangement of the cp genome 

The plastid genome of ArM0029B contains 79 genes en- 
coding proteins, 32 tRNA genes, and 3 rRNA genes 
similar to that of C. variabilis NC64A (Table 2). The 
ArM0029B plastid gene repertoire differs from that of C. 
variabilis NC64A except for the absence of pseudogenes 
similar to chlL and an intronic endonuclease in the psbC 
gene, and from C vulgaris C-27 by the absence of tRNA- 
Val (UAC) and the minE homolog. AM0029B has a small 
cp genome among species although it has a similar num- 
ber of genes to C. variabilis NC64A, C. vulgaris C-27, Coc- 
comyxa sp. C-169, and Trebouxiophyceae sp. MX-AZ01 
(Tables 1 and 2). The compactness of the cp genome of 
ArM0029B is due to a short intergenic sequence and fewer 
introns. The conserved gene order and rearrangement of 
cp genomes among ArM0029B, C. variabilis, and C. vul- 
garis were compared in Figure 2. The gene order in the 
plastid genome of Chlorella sp. ArM0029B is very similar 
to that of C. variabilis NC64A. Rearrangement of 



genomes between ArM0029B and C. variabilis was found 
in two regions; trnV and the 15-kb gene cluster, including 
"trnI-ycf20-psaC-trnN-minD-trnRl-chlN-chlL-ccsA-rpl32- 
cysT-ycfl-psbA", are present in inverse orientation be- 
tween Chlorella sp. ArM0029B and C. variabilis NC64A 
(Figure 2 and see Additional file 2: Figure SI). Marked re- 
arrangement of gene clusters was detected between 
ArM0029B and C. vulgaris. Many gene clusters conserved 
in green algae [22] are also conserved in Chlorella sp. 
ArM0029B (Figure 2). Interestingly, the gene order of 
"trnC-rpoB-rpoCl-rpoC2-rbcL-rpsl4" is well conserved be- 
tween ArM0029B and two Chlorella spp., C. vulgaris and 
C. variabilis (see Additional file 3: Figure S2) but not in re- 
lated species Coccomyxa sp. C-169 and Trebouxiophyceae 
sp. MX-AZ01, suggesting that the gene order is well con- 
served and may be specific to Chlorella species. The order 
of psbD and psbC genes are conserved and closely linked 
in all sequenced Trebouxiophyceae. Interestingly, the 5' 
coding region of the psbC gene seemed to be overlapped 
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20 kb 

Figure 2 Comparison of the cp genomes of C. variabilis, Chlorella sp. ArM0029B, and C. vulgaris. Thick bars represent protein-coding genes 
(blue), rRNA genes (green), and tRNA genes (red). Genes in the inverted region were highlighted in orange (protein-coding genes) and purple 
(tRNA genes) boxes in ArM0029B. LCBs and conserved genes are connected by light and dark gray rhomboids, respectively. 



with the 3' coding region of psbD on the same strand in 
ArM0029B. This phenomenon of two genes overlapping 
occurs frequently in the genomes of viruses, prokaryotes, 
mitochondria, and eukaryotes, including humans [23-25]. 
The overlap of psbD and psbC seemed to exist in all of the 
Trebouxiophyceae sequenced except for Helicosporidium 
sp., which lacks psbD and psbC in the plastid genome. The 
psbC gene in Coccomyxa sp. C-169 and Trebouxiophyceae 
sp. MX-AZ01 was annotated with Gly as a starting amino 
acid, resulting in separation of psbC from psbD. Possible 
Gly start codons of psbC are also found a few bases after 
psbD in all sequenced Trebouxiophyceae. However, the 
ATG or GTG start codon is also found in the 3 ' coding re- 
gion of psbD in those species. In other class of viridiplantae, 
Oltmannsiellopsis viridis and Pseudendoclonium akinetum 
of Ulvophyceae, Nephroselmis olivacea in Prasinophytes, 
and Mesostigma viride in Charophyceae also share the 
same feature of overlapping of psbD-psbC or a GTG start 
codon of psbC without overlap with psbD. However, in the 
case of psbC separated clearly from psbD such as in C. rein- 
hardtii and Senedesmus, the N-terminal amino acid se- 
quence of psbC is Met-Glu-Thr-Leu-Phe-Asn-Gly-Thr(Ser). 
The italic amino acids are well conserved and are encoded 
in all overlapped sequences of the above species, indicating 
that all linked genes of psbD and psbC may be overlapped 
in the same manner. 

Gene content of the mt genome 

The mt genome of ArM0029B contains 32 mt protein 
coding genes, 27 tRNA genes, and 3 rRNA genes 
(Tables 1 and 3). The 32 protein-coding genes include 4 
dtp genes, 3 cox genes, 9 nad genes, 13 ribosomal pro- 
tein genes, and cob and tatC genes. Helicosporidium, an- 
other trebouxiophycean alga, also has the same content 
of protein-coding genes. Three other trebouxiophycean 
algae, Prototheca, Coccomyxa, and MX-AZ01, have only 
30 among 32 protein-coding genes of ArM0029B and 
Helicosporidium (Table 3). Two ribosomal protein genes, 
rpl6 and rpsll, are absent in Coccomyxa and MX-AZ01. 
Prototheca lacks two genes, atp4 and rpl5. It is assumed 



that the genes in the three taxa were recently lost in the 
lineage of Trebouxiophyceae, possibly nuclear transferred. 
Compared with trebouxiophycean algae, the chlorophy- 
cean algae, including Scenedesmus, Dunaliella, Gonium, 
and Chlamydomonas do not have any ribosomal protein 
gene and tatC, which are found in Trebouxiophyceae 
(Table 3). Scenedesmus has the largest content of protein- 
coding genes among the chlorophycean algae with 13 
protein-coding genes, all shared by trebouxiophycean 
algae. The 13 protein-coding genes include 2 atp genes, 
3 cox genes, 7 nad genes, and a cob gene. Among the 
13 genes, 6 are absent in the other three chlorophycean 
algae, Dunaliella, Gonium, and Chlamydomonas. The 
three chlorophycean algae contain the other seven 
protein-coding genes, including cob, coxl, nadl, nad2, 
nad4, nad5, and nad6. The ArM0029B mt genome con- 
tains 27 tRNA genes, the largest in number among tre- 
bouxiophycean algae (Table 3). The tRNA gene content 
of other trebouxiophycean algae ranged from 23 to 26. 
Both Coccomyxa and MX-AZ01 share introns in four 
tRNA genes. In Chlorophyceae, although Scenedesmus 
has 27 tRNA genes, Dunaliella, Gonium, and Chlamy- 
domonas have only three types of tRNA genes, trnM, 
trnQ, and trnW. Additional file 4: Figure S3 shows the 
characterization of the mt tRNA genes of Chlorella sp. 
ArM0029B. Amongst trebouxiophycean algae, Chlorella 
sp. ArM0029B has the largest number of tRNA genes, 
and the secondary structure of tRNA genes in treboux- 
iophycean algal mitochondria is unknown. Thus far, the 
Chlorella sp. ArM0029B is known to contain the largest 
gene content of the mt genomes of Trebouxiophyceae. 

Analysis of the conserved gene cluster in the mt gen- 
ome is difficult because of limited information regarding 
mt genomes in trebouxiophycean algae. The mt genome 
of Chlorella species was not reported except for 
ArM0029B of this study. It has been reported that the 
overall gene order in mt genomes is conserved between 
the non-photosynthetic group "Prototheca wickerhamii 
and Helicosporidium sp" and the high-GC content group 
"Coccomyxa sp. C-169 and Trebouxiophyceae sp. MX- 
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AZ01", respectively. [14,17]. The overall gene order on 
the mt genome of Chlorella sp. ArM0029B is not con- 
served with any member of the non-photosynthetic 
group or high-GC group. Nevertheless, the gene order 
for "trnS-trnV-trnL" and "trnY-atp8-atp4" is well con- 
served in all five species of Trebouxiophyceae. 

Introns in the organellar genomes of ArM0029B 

Two group I introns are found in the organellar genomes 
of Chlorella sp. ArM0029B. One resides in trnL (UAA) of 
the cp genome, and the other is located in coxl of the mt 
genome. The trnL (UAA) group I intron of cyanobacterial 
origin is an ancient self-splicing group I intron in the plas- 
tid genome that is rarely lost in some taxa [26,27]. The 
ArM0029B mt genome has the intron between bases 720 
and 721 of coxl. Among trebouxiophycean algae, the 
intron with the same insertion site is also found in C. vul- 
garis, Prototheca, and Helicosporidium but not in Cocco- 
myxa and MX-AZ01 (Figure 3A). The Chlorella sp. 
ArM0029B intron is a as-splicing intron and has an ORF 



starting at Loop 6 (L6) and ending at P8-P7 (Figure 3B). 
The ORF has two LAGLIDADG endonuclease motifs. 
The endonuclease-like ORF of the group I intron is 
known to have two LAGLIDADG motifs [28]. The ORF 
with two LAGLIDADG motifs is also found in the same 
intron of C. vulgaris and Prototheca (Figure 3C). Unlike 
Chlorella and Prototheca, Helicosporidium has a trans- 
splicing intron without an ORF. As shown in Figure 3B, 
the dis-connection of the intron in Helicosporidium oc- 
curs at loop 8, which contains the ORF, assuming that the 
traws-splicing intron of Helicosporidium might be derived 
from cw-splicing by genomic rearrangement, followed by 
loss of the ORF. Compared with the ArM0029B intron, 
other related species in Trebouxiophyceae contain 3-11 
introns in two to six genes of their mt genomes (Table 1), 
indicating that the mt genome of ArM0029B has the smal- 
lest number of introns among those reported in Trebou- 
xiophyceae. An intron or introns split the coxl gene into 
two exons in ArM0029B, four exons in Prototheca wicker- 
hamii, three exons in Helicosporidium sp., and two exons 



Group I intron 



C. ArM0028B 



CCAACACTTATTCTGGTTTTTCGGT TAAACT-1012nt-AAAACT CACCCAGAAGTGTATATCCTCATTCTCCCTGGTTTTGG 



C . vulgaris T T 

Prototheca 

Helicosporidium T T C. 

Coccomyxa . . . G . . TC . C C . 

MX-AZ01 . . . G . . TC . C C. 



. . . . ~1225nt~TTC. .G 
.CAC~1311nt~TG. .AG 
. TTG- x CAG 



.T TT.A 

.A TT.A. . .A. A. 

.T. .C. . TT . G . .A. .T. 

. C . . C CT . G . 

.C. .C G. .C. .G. 



.A. 
.A. 
.A. 



B 



P5 



6 C 

D A 

G C 

A U 



P1u G 

g C 

g c 



P2S 



P3 



GUSUACACGUUACUAUUU 
GCAADG 



A 0 
G C 

P4 u A pfi 

C G "° AA 

G C-AGGA 

DCCD GAGA q 
AGAC AG 

utucag-^ 

AAAGUCCGUG UCUGcaccc-3' 



0 



e 



P5 



D A 
A D 
C- C 



L6 



P1u 

9 

5'-9 1 



P2 



P3 



P4 UA 



AAAUCG-CA'JGACUAUAU- 
GUACU3 



P6 , 

AAGAA 

trucuu . 



, AAAGUCCAUUU ACAGcaccc-3' 



P8 



C. ArM0029B 



A 0 

A 0 

C G 

0 A 

o a 

G C 

0 G 



P7 



0 A 



P7 



P9 



P8 



Helicosporidium 



A 0 

C G 

O A 

A C 

A 0 

A o 

A <J 

o 



D A 
0 A 
A D 
D A 
A O 
C G 



P9 



motif 1 



AxMC029B 



Prototheca 



MNKKKIbWSX-RGSSETLRNETCLN'YHWFAGLIJADGGFYVSRERYVSCEITM:^ 13 D 
MTOQSTQTEXIRGSSETLRNESCx^KHWLAGLIOADC^FYISRNR^ 131 
MKN--FKFiEWL3GLIDADGEFiTSKSGYGSIEITrai^ 112 



motif 2 



NAWLSGFFSGDGCrSINRSAGFQ?SAS:32KZKQVL2RIACLAG^^ 254 

TGWFSGFFSGDGSFSINTTNTFQFAIAISQAEKQILSEIASWGGQVYADKSWNGW^ 251 

NAWFSGFFTGKGCISINKTN-F>iAVISVSQKEK3ILENIQrIFKGNISFT)ISLKIWIW 234 



Figure 3 The mt coxl intron of C ArM0028B and its distribution among trebouxiophycean algae. A. The exon and intron border 
sequences. B. Secondary structures of the ArM0028B c/s-splicing intron and Helicosporidium rram-splicing intron. C. Comparison of the intronic 
ORF with two LAGLIDADG motifs. 
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with different cognate sites in Trebouxiophyceae sp. 
MX-AZ01, and no intron was found in Coccomyxa sp. 
C-169. The intron distribution on the plastid genome 
in ArM0029B is different from that of other trebouxio- 
phycean algae: three introns in trnL?>, psbA, and psbC 
of C. variabilis NC64A, three introns in trnUi, rrnL, 
and chlL of C. vulgaris, one intron in psbB of Cocco- 
myxa sp. C-169, and five introns in ftsH, psbA, and 
rrnL of Trebouxiophyceae sp. MX-AZ01 (Tables 1 and 
2). Fewer introns, the lack of pseudogenes, and shorter 
intergenic regions contributed to the more compact 
plastid genome of ArM0029B than that of C. variabilis 
NC64A and C. vulgaris C-27. 

Phylogenetic affinity of ArM0029B to other 
trebouxiophycean algae 

Phylogenetic relationships among seven trebouxiophy- 
cean algal plastids were investigated using the aligned 
10,938-base DNA sequence of six large photosystem 
genes— psaA, psaB, psbA, psbB, psbC, and psbD — and 
rbcL, which are widely used for phylogenetic studies 
[29-31]. Phylogenetic analysis outgrouped by four chlor- 
ophycean algae produced a single plastid maximum par- 
simonious (MP) tree (Figure 4A), and NJ and ML 
analyses showed a single tree with the similar topology 
(see Additional file 5: Figure S4). Four chlorophycean 
algae and six trebouxiophycean algae are separated into 
two sister clades with 100% bootstrap/Jackknife sup- 
ports. Trebouxiophycean algae are separated into two 
clades: the Coccomyxa-[MX-AZ01] clade with 100% boot- 
strap/Jackknife supports and the Chlorella-Parachlorella- 



Oocystis clade with 84-85% bootstrap/Jackknife supports. 
Within a Chlorella-Parachlorella-Oocystis clade, ArM0029B- 
Chlorella 2 spp. formed a clade with 100% bootstrap/ 
Jackknife support, but Parachlorella and Oocystis were 
clustered without bootstrap/Jackknife supports. The 
distance matrix of the aligned 10,938-base DNA se- 
quence among seven trebouxiophycean algae is shown 
in Additional file 6: Table S2. The distance ranged from 
9.193% to 10.299% between Chlorella species and 
ranged from 14.071% to 26.705% among Trebouxiophy- 
cean genera. The distance between ArM0029B and C. 
variabilis (9.193%) is smaller than the distance between 
ArM0029B and C. vulgaris C-27 (10.299%) or the distance 
between C vulgaris C-27 and C. variabilis (9.403%), indi- 
cating the close relationships of ArM0029B to Chlorella 
variabilis. The closer genus to Chlorella spp. was Para- 
chlorella with 14.071 ~ 14.585% distance and Oocystis with 
15.911 ~ 16.435% distance. Among other genera, Para- 
chlorella and Oocystis had a 14.959% distance, and Cocco- 
mixa and MX-AZ01 had a 16.901% distance. Except for 
the genera discussed above, over 20% distance was de- 
tected among Trebouxiophycean genera. The results indi- 
cate that ArM0029B belongs to the genus Chlorella along 
with C. vulgaris and C. variabilis and that C. variabilis is 
the closer taxon to ArM0029B. 

The mt genome-based phylogenetic relationships among 
five trebouxiophycean algae were also analyzed using the 
translated amino acids sequences of seven genes, cob, coxl, 
nadl, nad2, nad4, nadS, and nad6, which are shared by 
trebouxiophycean and chlorophycean algal mitochondria. 
Phylogenetic analysis outgrouped by four chlorophycean 



Chlorella sp. ArM0029B 
C. variabilis NC015359 
C vulgaris AB001684 



Parachlorella NC012978 

Oocystis FJ96S739 

Coccomyxa NC015084 

Trebouxiophyceae MX-AZ01 NC018569 



Chlorellales 




Chlamydomonas NC005353 
Gonlum AP012494 
Dunaliella GQ250046 
Scenedesmus DQ396975 

500 changes 



Coccomyxa NC015316 



— Chlorella sp. ArM0029B 
Trebouxiophyceae MX-AZ01 NC018568 
Helicosporldlum NC0017841 
Prototheca NC001613 
Scenedesmus X 1737 5 

Chlamydomonas NC001638 



cox7 intronat 721 

c/s-splicing intronwith orf 

frans-splicing intronwithout orf 
c/s-splicing intronwith orf 



| Chlorellales 
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Dunaliella NC012930 



100 changes 

Figure 4 Single maximum parsimonious (MP) trees. A. The plastid MP tree from DNA sequences of rbcL and six photosystem genes. B. The 
mt MP tree from translated amino acid sequences of seven protein-coding genes. The distribution of coxl intron at 721, its splicing form, and 
presence/absence of its orf were denoted. */*: 1 00% bootstrap support/1 00% jackknife support. -/-: bootstrap and jackknife not supported. 
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algae produced a single MP tree (Figure 4B) and a single 
NJ tree with the same topology (see Additional file 5: 
Figure S4C). Four chlorophycean algae and five trebouxio- 
phycean algae are separated into two sister clades with 100% 
bootstrap/Jackknife supports. The five-trebouxiophycean 
algae MP tree contained one clade, a Helicosporidium- 
Prototheca clade with 100% bootstrap/Jackknife support 
and three isolated taxa— ArM0029B, MX-AZ01, and Coc- 
comyxa. Although the three taxa were clustered, the clus- 
ter was weakly bootstrap/Jackknife (65%/67%) supported. 
Although ArM0029B formed a clade with MX-AZ01 and 
Coccomyxa in MP and NJ trees, the mt phylogenomic 
affinity of ArM0029B to other trebouxiophycean algae 
remains to be investigated because of limited information 
of available mt genomes in trebouxiophycean algae. Scene- 
desmus, which contains the largest number of genes 
among Chlorophyceae, has ancient mt characteristics 
among green algae and is explained as a basal group in 
phylogenetic analysis. Green algae have lost many mt 
genes via gene transfer to the nucleus. ArM0029B con- 
tains more mt genes than other sequenced trebouxiophy- 
cean algae to date, suggesting that it may show ancient 
characteristics of its mt genome among trebouxiophycean 
algae. However, we cannot exclude the possibility of new 
integration of genes into the ancient-type trebouxiophy- 
cean alga with fewer genes in the mt genome. 

The Chlorellales in the phytogenies of chloroplasts and 
mitochondria and the meaning of coxl intron at base 
position 721 of the coxl gene 

The Chlorellales is a green algal group lacking flagella 
whose members include Chlorella, Parachlorella, Oocys- 
tis, Prototheca, and Helicosporidium. Chlorella and Para- 
chlorella inhabit in freshwater, marine, or land in the 
coccoidal form [32]. Prototheca and Helicosporidium are 
non-photosynthetic and parasitic coccoids. Phylogenetic 
relationships within the Chlorellales have been well stud- 
ied, and nuclear and cp gene data have provided evi- 
dence that the Oocystaceae, including the semi-colonial 
Oocystis, form an early diverging clade within the Chlor- 
ellales [1]. Our cp genome phylogeny is congruent to 
those in previous reports and shows the formation of a 
strong clade containing Chlorella, Parachlorella, Oocys- 
tis, Prototheca, and Helicosporidium. The coxl intron at 
base position 721 in the coxl gene of the mt genome is 
found only in the members of Chlorellales in Trebouxio- 
phyceae — i.e., Prototheca, Helicosporidium, Chlorella 
vulgaris and Chlorella sp. ArM0029B. The coxl intron 
distribution supporting Chlorellales does not agree with 
mt genome phylogeny. The occurrence of the coxl in- 
tron in free-living Arctic Chlorella sp. ArM0029B and 
Chlorella vulgaris, as well as in the parasitic coccoids 
Prototheca and Helicosporidium, indicates that the in- 
tron with the same origin had been introduced in early 



Chlorellales and that the £ra«s-splicing of the intron oc- 
curred after the divergence of Prototheca and Helicospor- 
idium in the parasitic coccoid clade. Limited taxon 
sampling, high variation of mt genes, and possible lateral 
gene transfer from other taxa might affect the topology 
of the phylogenetic tree in the present study. Increasing 
members of representative taxa in trebouxiophycean 
algae would help to improve the understanding of its 
evolution. 

Conclusions 

Organelle functions play an important role in maintain- 
ing an organism's life, including energy production, 
photosynthesis, and metabolite biosynthesis. The chloro- 
plast is an organelle for fatty acid/lipid biosynthesis, and 
the mitochondrion is an organelle for fatty acid/lipid deg- 
radation. Recently, oil-producing microalgae have been 
studied intensively for genetic improvement, including 
genomics and genetic engineering. Chlorella is an import- 
ant microalgae for oil production [33]. ArM0029B is a 
Chlorella sp. originated from the Arctic region, which 
have features of fast growth at various temperatures and a 
high oil-accumulating trait [19]. 

Here, we report the 119,989-bp cp genome and 65,049- 
bp mt genome of Arctic Chlorella sp. ArM0029B. The 
plastid genome of ArM0029B lacking a large IR is close to 
C. variabilis NC64A: both species displayed the same con- 
tent of conserved genes and almost the same gene order. 
However, large rearrangements are also found between 
ArM0029B and C. variabilis NC64A by inversion of a 
15-kb gene cluster. Major structural changes were de- 
tected in introns and tRNAs in ArM0029B compared 
with related species of Trebouxiophyceae. The mt genome 
of ArM0029B contains the largest number of genes (62 
genes) and smallest number of introns (one intron in 
coxl) among trebouxiophycean algae. Detailed informa- 
tion regarding the secondary structure of the tRNA genes 
would be obtained in a Chlorella mt genome study. Two 
group I introns were found in ArM0029B: a self-splicing 
intron in trnL (UAA) of the cp genome and another in- 
tron in coxl of the mt genome containing an ORF encod- 
ing an endonuclease with double motifs of LAGLIDADG. 
Phylogenetic analysis of cp genomes suggests that three 
Chlorella species belong to a monophyletic group, and 
ArM0029B belongs to the genus Chlorella. The phylogen- 
etic analysis of mt genomes with limited information of 
the available mt genome in Trebouxiophyceae could not 
determine the closest mt genome of ArM0029B among 
the four trebouxiophycean algae. The lowest number of 
introns in the organelle genome of ArM0029B among 
Chlorella spp. may be due to the limited chance of intron 
spreading and invasion by the isolation in the Artie envir- 
onment from other related taxa. Based on the gene con- 
tent, the ArM0029B organelle genomes seem to have 



Jeong et al. BMC Genomics 2014, 15:286 
http://www.biomedcentral.com/1471-2164/15/286 



Page 12 of 14 



ancient organelle characteristics with many genes and 
fewer introns gene in both genomes. 

In the present study, cp genome phylogeny supports 
monophyly of the seven investigated members of Chlor- 
ellales, including three Chlorella spp., Parachlorella, 
Oocystis, Prototheca, and Helicosporidium. The intron 
distribution at base position 721 of the coxl gene occurs 
in all four investigated Chlorellales taxa — Chlorella sp. 
ArM0029B, Chlorella vulgaris, Prototheca, and Helicos- 
poridium — assuming that a common ancestor of the 
Chorellales might display the coxl intron as a m-splice 
form and that the c/s-splicing intron was recently trans- 
spliced in Helicosporidium. When more mt genomic in- 
formation is available, we will have better understanding 
of the mt genome phylogeny of the trebouxiophycean 
algae. 

The unique features of Chlorella sp. ArM0029B organ- 
elle genomes presented here will provide important in- 
formation to understand organellar genome evolution, 
including introns, gene rearrangement, and structural 
changes of plastids and mt genomes among species in 
Trebouxiophyceae and green algae. 

Methods 

Strain and culture conditions 

Chlorella sp. ArM0029B [19] was maintained on solidi- 
fied TAP medium [34] and cultured in liquid medium 
for analysis at 25°C with 200 rpm shaking under con- 
stant white light (40 umol irT 2 s _1 ). 

Sequencing, assembly, and annotation of the ArM0029B 
organelle genomes 

The ArM0029B organelle genomes were sequenced as part 
of the ArM0029B genome project (funded by Advanced 
Biomass R&D Center) using an Illumina HiSeq 2000-based 
whole-genome shotgun sequencing approach. The organ- 
elle sequences were obtained using the CLC Genomics 
Workbench version 5.5. Two large contigs (65.049 kb and 
120.090 kb) with the highest average read coverages 
(19,505 and 7,485, respectively) were identified; the contigs 
displayed low GC content compared with the high GC 
content of the nuclear genome. Circular structures of each 
replicon were confirmed by polymerase chain reaction 
(PCR) amplification at their ends and by joining of Sanger 
sequence reads derived from the amplicons. The assem- 
blies were further verified by examining paired-end dis- 
tance and depth after re-mapping reads on the contig 
sequences. The BLAST searches of two large contigs were 
verified to plastid and mt genomes, respectively. For gene 
annotation of organelle genomes, ORFs encoding 50 amino 
acids or longer were identified and searched against a 
known protein database (NR). Genes encoding proteins 
homologous to known short cp peptides were manually 
identified. Glimmer (ver. 3.02) was used to predict 



additional putative protein-coding genes [35]. tRNA 
and rRNA genes were respectively detected using ARA- 
GORN [36] and RNAmmer 1.2. The rrnS in the mt 
genome was detected based on BLAST search and the 
5S rRNA data bank [37]. The complete sequences of 
the ArM0029B chloroplast and mitochondrion were de- 
posited in GenBank under the accession numbers 
KF554427 and KF554428, respectively. 

For comparison of the mt genomes of ArM0029B, we 
used all four mt genomes to date reported in Trebouxio- 
phyceae: Prototheca wickerhamii (NC_001613), Helicospor- 
idium sp. ex Simulium jonesi (NC_017841), Coccomyxa sp. 
C-169 (NC_015316), and Trebouxiophyceae sp. MX-AZ01 
(NC_018568). We skipped Parachlorella minor because of 
the very low gene content and arguable placement to be 
included in Trebouxiophyceae. For the plastid genomes 
of ArM0029B, we selected four reported species, including 
C. variabilis NC64A (NC_015359), C. vulgaris C-27 (NC_ 
001865), Coccomyxa sp. C-169 (NC_015084), and Trebou- 
xiophyceae sp. MX-AZ01 (NC_018569). We did not in- 
clude non-photosynthetic Trebouxiophyceae species to 
compare plastid genomes because they do not contain 
many photosynthetic genes. 

Comparative analysis of cp genomes 

The complete cp genomes of ArM0029B, C. variabilis 
NC64A, and C. vulgaris C-27 were compared using the 
MAUVE alignment tool [38] to identify rearrangement- 
free LCBs (locally collinear blocks) among genomes, 
yielding 25 LCBs with a minimum weight of 170. The 
genome sequence of the C. variabilis chloroplast was ar- 
tificially rearranged prior to the MAUVE alignment so 
that the genome-level alignments could be maximally 
shown. Conserved genes among the three cp genomes 
were identified using the BLASTN search. genoPlotR 
[39] was then used to visualize conserved genes in the 
context of genomes and LCBs. 

Secondary structure analyses of intron and mt trn genes 

The secondary structure of the group I intron was con- 
structed based on the methods of Burke et al. [40] and 
Michel and Westhoff [41]. For the secondary structure 
of mt trn genes, the method of Chuang et al. [42] was 
consulted. 

Phylogenetic analyses 

The phylogenetic relationships of ArM0029B among 
green algae were investigated using both chloroplast and 
mitochondrion genomic information. As an ingroup, all 
reported trebouxiophycean organellar genomic informa- 
tion was included. Currently, information concerning six 
cp genomes is available in Trebouxiophyceae: Chlorella 
vulgaris C-27 (AB_001684.1), Chlorella variabilis NC64A 
(NC_015359), Parachlorella kessleri (NC_012978), Coccomyxa 
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The red-boldface indicates genes with a conserved order. The direction 
of the box arrow denotes sense orientation of transcription of the gene. 

Additional file 4: Figure S3. Secondary structure of mt trn genes in 
Chlorella sp. ArM0029B. 

Additional file 5: Figure S4. Single ML (A, HYK85 + G + I model) and 
NJ (B) trees from the DNA sequences of seven cp genes and a NJ tree (C) 
from translated amino acid sequences of seven mt protein-coding genes. 
*/*: 100% bootstrap support/100% jackknife support. -/-: bootstrap and 
jackknife not supported. 

Additional file 6: Table S2. Distance matrix of seven cp gene 
sequences in Trebouxiophyceae. 



sp. C-169 (NC_015084), Oocystis solitaria (FJ968739), 
and Trebouxiophyceae sp. MX-AZ01 (NC_018569). 
Four reported trebouxiophycean algal mt genomes are 
Coccomyxa sp. C-169 (NC_015316), Helicosporidium 
sp. ex Simulium jonesi (NC_017841), Prototheca wick- 
erhamii (NC_001613), and Trebouxiophyceae sp. MX- 
AZ01 (NC_018568). A partial clone (AB011523) of the 
coxl gene for Chlorella vulgaris was also used. To avoid 
bias by taxon sampling, four chlorophycean algae, known 
both as cp and mt genomes, were used as an outgroup. 
These include Chlamydomonas reinardtii (NC_005353 for 
cp; NC_001638 for mt), Gonium pectoral (AP_012494 for 
cp; NC_020437 for mt), Dunaliella salina (GQ_250046 
for cp; NC_012930 for mt), and Scenedesmus obliquus 
(DQ_396875 for cp; X17375 for mt). 

DNA sequences of seven cp protein genes, including 
psaA, psaB, psbA, psbB, psbC, psbD, and rbcL were used 
for the cp phylogenetic MP, NJ and ML tree using Paup 
ver. 6.0. Bootstrap and jackknife analyses of MP tree 
were also performed with 1,000 replication. Shared gene 
contents among chlorophycean and trebouxiophycean 
algal mitochondria were very limited, and the DNA se- 
quence variation of protein-coding genes was highly 
variable for successful alignment. Translated amino acid 
sequences of seven protein-coding genes, including cob, 
coxl, nadl, nad2, nad4, nadS, and nad6 were used for 
the mt MP tree. In the analysis, gapped sequences were 
not included. Bootstrap and jackknife analyses were also 
performed with 1,000 replication. 

Additional files 
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